-
Notifications
You must be signed in to change notification settings - Fork 1
add round-trip for share derived via interpolation #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
BenWestgate
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concept ACK
I agree we should do something to prevent this unexpected behavior but it's due to unintuitive mathematical properties of finite fields.
The reason is GF(32) interpolation does not preserve padding, it operates on full 5-bit values. So you storing bytes and throwing away the padding means you don't have enough information to construct the same share you extracted bytes from.
You MUST pass padding to reconstruct a derived string (one produced by interpolation).
I don't know how to enforce that at the library level, any ideas? Especially when we both agree a default is also a nice feature, but it foot guns here.
| assert d.s == "ms13k00ldp4v5nw8lph96x47mjxzgwjexe44p32swkq99e0w" | ||
|
|
||
| # now round-trip d share ('d' is derived via interpolation, NOT via 'from_seed') | ||
| dd = Codex32String.from_seed(d.data, "ms13k00ld") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't do this. You can only .from_seed without passing pad_val for the k initial strings, derived strings MUST be passed padding to round-trip..
You needed to be able to do this:
dd = Codex32String.from_seed(d.data, "ms13k00ld", d.pad_val)This version's Codex32String lacks a pad_val property, I'm working on an update which does.
No matter what padding style we use, since it's less than a full 5-bit value, so not in field GF(32), it will not interpolate into derived shares and maintain any linear relationship that allows round-tripping from bytes, GF(256), to GF(32) interpolated strings without passing the padding.
The only string you should care about data of after construction is "s" so the fact other share index values can return data is more of a curiosity and maybe .data should Raise InvalidShareIndex or return None if share_idx != "s" to this misuse.
What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm able to do this which fixes this test case:
dd = Codex32String.from_seed(d.data, "ms13k00ld", pad_val=1)
but I have no idea how did I get to the pad_val=1 besides grinding it against the string which I already know (which won't be the case in real life)
I don't know how to enforce that at the library level, any ideas?
not really... besides grinding correct pad_val right after construction of derived share via round-trips (very meh)
What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?
So my general idea is that I can use individual shares as normal secrets, load them on HWW, sign with them, etc. For instance user uses one HWW device to do the shamir split, while having N devices ready to export generated/derived shares as QR codes for instance. Load these derived shares on devices and geo-distribute the devices. These then serve as decoy, fully functional signers. When S secret is needed user just collect K devices & does some QR scanning to recover the S on empty HWW.
For this I thought I can use this from_seed/to_seed round-trips. Secure element storage is limited so for me byte encoding is more desired instead of u5.
But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I have no idea how did I get to the
pad_val=1besides grinding it against the string which I already know (which won't be the case in real life)
You had to grind it because you discarded the pad_val. You might recover a different last data character if you don't know the last character without padding. interpolate_at operates on 5-bit values not bytes.
any ideas?
not really... besides grinding correct
pad_val... (very meh)
It may be possible to do it if you give up being able construct "non-encoded" shares from bytes data and instead accept construction of a Codex32ShareSet object with a from_bytes (or from_seeds) factory. And then use an interpolate_at(share_idx) method of that share set object.
What is your exact use case...?
generated/derived shares as QR codes for instance.
Make sure to skim this compact CodexQR discussion before speccing a QR design, it's the analog of compact SeedQR. I found a fun way to fit 128-bit codex32 share data into 21x21 QR codes by dropping some of the identifier.
Whatever solution we find for Codex32ShareSet.from_bytes(header, dict) would be very helpful there, as well as here.
These then serve as decoy, fully functional signers.
This seems useful!
For this I thought I can use this from_seed/to_seed round-trips.
You may be able to round trip the share set from_seeds/to_seeds or .data of individual shares but we need to define the correct Codex32ShareSet from_seeds class method to make this possible.
The source of truth in a Codex32ShareSet should be the common header and the byte payloads of "s", "a", "c" for k = 3 or maybe "a", "c", "d". CRC padding, which does not interpolate, is slightly more useful on a share you can actually find and verify it on, than trying to interpolate to an unknown share to check if it validates.
Secure element storage is limited so for me byte encoding is more desired instead of u5.
A 21x21 QR has only 137.2 bits if using base45 alphanumeric encoding, 138.2 bits if also using kanji, bytes and numeric modes. So it'd be excellent for us to define a compact encoding of share data. The bare minimum needed to always recover the correct secret and with what's left: prevent user errors.
But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?
Yes, this is not their intended purpose but they do contain randomness and I think your idea is a cool and efficient use of that otherwise wasted random data needed for SSS so worth pursuing IF it can be done securely (not revealing any more info about "s" than, at most, its padding bits with k-1 shares.)
I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.
I agree. The solution to recover seeds from bytes alone is non-trivial but it should exist, lets find it. You'll find this bytes vs 130-bits question tripped up Andrew in the QR discussion, it's always surprising how padding behaves as the finite field changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scgbckbone do you still want me to come up with a way to recover the same seed from any share's (not just initial strings) bytes without passing padding? I think it's technically possible but I haven't thought about how to do it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great! I wanted to try myself but do not have time for it atm. Imo it is very useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm writing a test for it. The first thing I notice is to go from bytes to a share, you need the share index.
So what you're really asking for is a way for a list of k shared-indexed bytes payloads to always recover the correct seed.
Unless you're implying the bytes payload should also derive the share index so it doesn't need to be stored & passed with the bytes. That requires some grinding when generating initial share payloads so that all n shares have the relation that computes the share index. It's feasible for up to 10 shares or so and gets rather slow towards 15-20.
Question 1:
- Store the share index with the bytes
- Derive the share index from the bytes
Question 2:
- May we assume the bip32 fingerprint of the secret bytes is always available?
- That allows a brute force search to try every share index and padding combination until the recovered seed matches the fingerprint, without forcing special derivation rules for padding and share index.
Question 3:
Should this recovery from bytes method work for bytes extracted from:
- Any share set, including ones with random padding as per BIP93 and the codex book
- Only share sets specially constructed to facilitate recovery from reduced information
Question 4:
Would it be easier for you to just prepend the share index bits to the payload bytes and append the padding bits (zero padded as necessary to bytes)?
- There does not seem to be anything wrong with BIP32 taking this extra non-random data as the seed but your software could also discard it to follow BIP93 exactly.
Question 5:
What about the threshold? Where is this stored?
- With the bytes
- Implied by the length of the bytes list passed to the recovery method
Question 6:
Is there any identifier? How do you avoid combining bytes from different share sets together in cases you have more than one of these bytes backups?
With the answers to these questions I should be able to proceed with a solution.
I am happy to help because what you want is very similar to my "Compact CodexQR" system where I have to represent shares in 138-140-bits, leaving only 10-12 bits for thresholds, id, share_idx and padding. Clearly some user experience and functionality will be lost but the trade off is worth it for a smaller QR code.
The solution may be a "lossy" compact_qr_decode(codex): which returns 18 bytes representing the essential data of the share, enough to recover the secret seed and some trade-off of:
- basic mismatch detection by storing some identifier bits
- useful UX like the recovery threshold, and/or
- some or all 9 thresholds
I'd also try to give higher thresholds more identifier bits as they have k-1 opportunities for mismatching.
For UI it's nice to store id bits in 5 bit chunks so whole bech32 characters can be displayed when scanning these Compact CodexQRs and they're interoperable with the regular shares. But even non-multiples of 5 still give some mismatch protection.
from_seed/to_seed)